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Abstract — The work identifies the first general, explicit, and 
non-random MIMO encoder-decoder structures that guaran- 
tee optimaUty with respect to the diversity-multiplexing trade- 
off (DMT), without employing a computationally expensive 
maximum-likelihood (ML) receiver. Specifically, the work es- 
tablishes the DMT optimality of a class of regularized lattice 
decoders, and more importantly the DMT optimality of their 
lattice-reduction (LR)-aided linear counterparts. The results hold 
for all channel statistics, for all channel dimensions, and most 
interestingly, irrespective of the particular lattice-code applied. 
As a special case, it is established that the LLL-based LR- 
aided linear implementation of the MMSE-GDFE lattice decoder 
facilitates DMT optimal decoding of any lattice code at a worst- 
case complexity that grows at most linearly in the data rate. This 
represents a fundamental reduction in the decoding complexity 
when compared to ML decoding whose complexity is generally 
exponential in rate. 

The results' generality lends them applicable to a plethora 
of pertinent communication scenarios such as quasi-static 
MIMO, MIMO-OFDM, ISI, cooperative-relaying, and MIMO- 
ARQ channels, in all of which the DMT optimality of the LR- 
aided linear decoder is guaranteed. The adopted approach yields 
insight, and motivates further study, into joint transceiver designs 
with an improved SNR gap to ML decoding. 

Index Terms — Diversity-multiplexing tradeoff, lattice decoding, 
linear decoding, lattice reduction, regularization, multiple-input 
multiple-output (MIMO), space-time coders-decoders. 

I. Introduction 
The general multi-dimensional linear channel model 



y = Hx 



w 



adequately represents a plethora of communication system 
models which utilize multi-dimensional transmit-receive sig- 
nals for attaining increased rates and reliability in the presence 
of fading. Such system models include quasi-static MIMO, 
MIMO-OFDM, ISI, amplify-and-forward (AF), decode-and- 
forward (DF), and MIMO automatic repeat request (ARQ) 
models. Each of the above models introduces its own structure 
on H and x, its own error performance limits, and its 
own requirements on coding and decoding schemes. Finding 
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general-purpose transceiver structures with (provably) good 
performance in these scenarios, and with a reasonable com- 
putational complexity, is challenging. 

A. Background and previous work 

Substantial amounts of work have focused on identifying 
performance criteria and constructing different coding schemes 
specifically suited to the different system models. For example 
in the case of the nx x riR quasi-static MIMO channel, we have 
seen the orthogonal space-time (ST) designs [1], [2] providing 
full diversity but doing so only at rates much less than those 
theoretically possible, codes like V-BLAST [3] providing full 
rate MIMO benefits but with much reduced diversity, and 
codes from the general linear dispersion designs [4] providing 
full rate benefits but no diversity guarantees for increasing 
spectral efficiencies. 

In outage limited communications systems, the fundamental 
limits with respect to the spectral efficiency and decoding 
error probability in the high signal-to-noise ratio (SNR) limit 
were succinctly characterized by Zheng and Tse's diversity 
multiplexing tradeoff (DMT) [5]. The tradeoff incorporated 
several previous performance measures and has been exten- 
sively adopted ever since as a benchmark for transceiver design 
and analysis. The work in [5] also introduced the notion 
of DMT optimal designs, i.e., designs capable of achieving 
the fundamental DMT of the underlying channel (c.f., [5] or 
Section HTBl l. 

1 ) Coding: Towards finding DMT optimal codes, the work 
in [5] proved the existence of such codes for the case of the 
i.i.d. Rayleigh fading quasi-static MIMO channel by using 
ensembles of random Gaussian codes over a finite coding 
duration, and thus reduced system model dimensionality. Al- 
though providing codes of finite length, such a construction 
is highly impractical given the lack of structure that would 
allow for practical codeword enumeration and decoding. This 
issue was addressed in [6] which, for the same setting, proved 
the existence of random ensembles of DMT optimal codes 
that accept a lattice structure. The same work successfully 
identified the suitability of the lattice framework for MIMO 
coding problems, and its effect on issues such as that of 
finding efficient shaping regions for the transmitted signals. 
However, random lattice designs inherently rely on different 
lattices for each rate and SNR and, furthermore, do not provide 
deterministic means by which to identify the lattice generator 
matrices. 
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These two issues were conclusively solved in [7], [8] which 
first provided practical construction criteria for DMT optimal 
codes for the quasi-static Rayleigh fading MIMO channel, 
and then explicitly constructed the first unified family of 
DMT optimal codes for all channel dimensions. These cyclic 
division algebra (CDA)-based codes, which were built based 
on the work of [9]-[ll], managed to employ for any given 
number of transmit antennas tit « single lattice generator 
matrix which is easy to identify. Furthermore, these codes 
guarantee DMT optimality for all fading statistics, due to the 
fact that they satisfy the approximate universality criterion of 
[12]. Other CDA codes [13], and later constructed variants of 
CDA-based codes [14]-[16], currently perform best among all 
existing ST codes. Specifically, the perfect ST codes proposed 
in [13], and later extended in [14], allow for approximate 
universality as well as information losslessness (c.f., [17]) 
for rotationally invariant ST channels. Later work in [15] 
employed the perfect ST code architecture, together with the 
lattice space-time (LAST) code framework in [6], to provide 
for an improved shaping region and better performance at 
lower values of SNR. Furthermore the work in [16] drew ST 
codes from subsets of CDAs that constitute maximal orders, 
which interestingly ensure a better fundamental volume of 
the corresponding lattice, and better energy efficiency [16]. 
The above DMT optimal codes form the basis for modified 
schemes that DMT optimally apply to different system models 
[18]-[22]. 

The codes discussed above have to date only been shown to 
provide DMT optimality in the presence of an ML decodefl 
and hence decoding complexity has remained the fundamental 
limitation in obtaining (provably) good decoding error proba- 
bility performance in a computationally efficient manner. This 
limitation, roughly speaking, originates from the fact that such 
codes must in general be drawn, due to enumerability and 
rate requirements, from lattices whose dimension "matches" 
the inherently high dimension of H. On top of that, in all 
but rare cases, the diversity requirements force code-channel 
lattices that cannot be decomposed into substantially "smaller" 
and simpler component lattices, without severely sacrificing 
rate gains. The high dimensionality, in conjunction with the 
high spectral efficiency that is envisioned in future telecom- 
munications, introduce prohibitive ML decoding complexity. 

2) Decoding: While sphere decoding (SD) methods [23]- 
[25], that perform a limited branch-and-bound type search 
within a hyper-sphere around the received vector, have been 
developed to provide ML decoding at reduced average com- 
plexity, they remain impractical for dense constellations, low- 
SNR and ill-conditioned or singular channel realizations [23]- 
[27]. This is mainly because they implement an exact solution 
to a closest vector problem (CVP) for each transmitted code- 
word. 

Substantial interest has been drawn by linear receivers 
based on the zero-forcing (ZF) or the minimum mean square 
error (MMSE) criteria, as these receivers avoid exact CVP 
solutions, and thus allow for simple implementation (c.f., [28] 

'a notable exception are the random LAST codes in [6], as discussed in 
Section II-A3I and throughout the present work. 



and references therein). An inherent limitation of ZF-based 
linear receivers is that ill-conditioned channel matrices lead to 
substantial noise amplification. This motivated the introduction 
of MMSE-based linear receivers which can be seen as ZF 
receivers that take into consideration the presence of additive 
noise and hence utilize a better-conditioned equivalent channel 
matrix. It is the case though that for ill-conditioned channel 
matrices, both these linear receivers, as well as receivers based 
on successive interference cancellation (SIC), are for the most 
part substantially suboptimal, as recent DMT analysis in [29] 
reveals. 

Notable steps towards better performing efficient receivers 
included the introduction of lattice-reduction (LR) techniques 
in [30], [31]. Motivated by the fact that ZF is optimal in 
the presence of orthogonal channels, the work in [30], [31] 
proposed the use of LR methods for better, nearly orthogonal 
conditioning of the equivalent channel matrix, prior to simple 
ZF or SIC decoding. This approach was partly validated by 
simulations (c.f., [25]) and by analysis as in [32] which 
showed that LR-aided ZF decoding can achieve maximal 
receive diversity for fixed-rate uncoded V-BLAST. LR-aided 
ZF decoding or naive lattice decoding is, however, not DMT 
optimal in general [6], [33]. The work in [24], [25], [34] 
proposed lattice decoding with MMSE-GDFE pre-processing 
which is well suited for the case of under-determined or 
singular channels. Contemporary work on LR-aided decoding 
in an MMSE pre-processed basis appeared in [35]. Simulation 
results indicated that such methods are capable of near-ML 
performance at a computational complexity that remains low 
[15], [24], [25], [35]. 

3) Codes with reduced decoding complexity: Several works 
focused on providing codes with reduced ML decoding com- 
plexity. Such work includes the multi-group decodable codes 
based on Clifford algebras in [36], and the codes in [37] 
for asymmetric (tir < nx) quasi-static MIMO channels. 
Similarly motivated work in [38] identified existing 2x2 full- 
rate full-diversity codes for the 2 x 2 MIMO channel [39]- 
[41], as fast decodable codes since they incur reduced sphere 
decoding complexity by essentially reducing the dimension- 
ality of the search space from 8 real dimensions to 6 real 
dimensions. This reduction is achieved by linearly combining 
two Alamouti style twisted codes, such that the corresponding 
QR decomposition employed in SD, yields a sparse R matrix. 
The sparseness property was shown to be unique to the case 
of TiT = T = 2 where tit and T denotes the number of 
transmit antennas and the coding duration respectively, and 
further extensions to the 4x2 MIMO channel came at the 
expense of reduced diversity [38]. 

Towards bridging the gap between ML and linear decoders, 
a hybrid transceiver was proposed in [42] to jointly employ an 
ML and an unbiased MMSE-SIC receiver, on an infinitely long 
(T oo) D-BLAST style tit x T space-time spreading (STS) 
code with an underlying QAM constellation. This hybrid 
transceiver allows for partial reduction in decoding complex- 
ity, and provides DMT optimality with 2nT-dimensional ML 
decoding (in every time slot). For the case where n-R, > ut, 
a pure ML receiver would generally incur a dimensionality of 
2nTT real symbols. 
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One of the most fundamentally important steps towards 
establishing that DMT optimality can be achieved with compu- 
tationally efficient encoders and decoders was, however, given 
in [6]. In the setting of the i.i.d. Rayleigh fading quasi-static 
MIMO channel, it was shown that the random codes from the 
ensemble proposed in [6] may be DMT optimally decoded 
by a lattice decoder (whereby the constellation boundaries are 
ignored in the decoding process). This was accomplished by 
the inclusion of the MMSE-GDFE pre-processing step and a 
random lattice translate. It should, however, be noted that an 
exact implementation of the MMSE-GDFE lattice decoder still 
requires the solution to a CVP, which is NP-hard in general 
[43]. Currently, except for the Alamouti transceiver structure 
[1] over the 2x1 quasi-static MISO channel, all known 
DMT optimal explicit, non-random, transceivers employ ML 
detection, and incur worst-case complexity that is exponential 
in the data rate. 

B. Principal results and outline 

The contribution of this work lies in the identification of a 
large class of scenarios where efficient variants of LR-aided 
linear lattice decoding, which is a generally suboptimal but 
computationally advantageous decoding strategy, achieve the 
diversity of the ML decoder. The work also presents the first 
explicit characterization of efficient non-ML encoder-decoder 
structures that meet the fundamental DMT performance limits, 
for very general channel statistics, dimensions, and models. 
DMT optimality is shown to be achieved with the smallest 
known complexity order among all DMT-optimal decoders that 
apply to general lattice designs. 

As a first step towards providing computationally efficient 
DMT optimality. Theorem [T] in Section IIII-CI proves that 
regularized lattice decoders are DMT optimal. The proposed 
class of decoders employs an unconstrained lattice search in 
a regularized metric which applies an incremental penaliza- 
tion to lattice points further from the origin. The decoder 
structure includes, as a special case, the MMSE-GDFE lattice 
decoder [6], [24]. The DMT optimality holds irrespective of 
the channel's fading statistics and irrespective of the lattice 
design which is decoded (c.f., [7]-[ll], [13]-[16], [18]-[22]), 
as long as the lattice design and fading distribution jointly 
induce a (right) continuou^ DMT curve (c.f., [5]) under ML 
detection. Currently all known DMT curves for the system 
models considered herein are continuous except possibly at 
the maximal multiplexing gain. The result holds also when 
ML decoding, due to suboptimality of the code applied, does 
not achieve the fundamental DMT of the channel. This further 
strengthens the view of regularized lattice decoding as a DMT 
optimal decoding strategy. 

As a second step towards computationally efficient DMT 
optimality. Theorem |2] in Section IIV-AI extends the above 
result to the class of all C -approximate implementations of 
regularized lattice decoders. Two decoders are here said to be 
C-approximate when their minimum metrics are at a distance 

similar continuity assumption is required (although not explicitly stated) 
in establishing the DMT optimality of approximately universal codes, c.f., [12, 
Th. 3.1]. 



less than some constant C (c.f.. Section II V- Al l. The DMT 
optimality of LLL-based LR-aided linear decoders, being C- 
approximate decoders, is then established by Corollary |2a] 

Theorem [3] in Section IIV-CI then considers the computa- 
tional complexity of the LR-aided solutions and proves that 
LR-aided DMT optimal decoding is feasible at a worst-case 
complexity of 0{\ogp) where p denotes the SNR, i.e., at 
a complexity which grows only linearly in the data rate. 
With LLL LR worst-case complexity known to be generally 
unbounded [44], the upper bound is guaranteed by exploiting 
channel information at the receiver and rigorously relating 
lattices that result in high probability of error, to lattices 
that may induce high LR complexity. The bound quantifies, 
in the scale of interest, the fundamental reduction in the 
decoding complexity of the proposed explicit transceivers, 
when compared to the ML decoder which has a complexity 
that is generally exponential in the rate. It also resolves, in 
the negative, the long standing open problem of whether DMT 
optimality requires a complexity that is exponential in rate. 

Section [V] considers different generalizations including the 
case of nested lattice designs, partial channel knowledge, 
general and possibly non-Gaussian noise characteristics, and 
provides a discussion of the case where the diversity multi- 
plexing characteristic of some scenario is discontinuous and/or 
unknown. Section |VT] then shows how the result directly 
applies to several pertinent computationally demanding com- 
munication scenarios such as MIMO-OFDM, ISI, amplify- 
and-forward, decode-and-forward and MIMO-ARQ settings, 
in all of which the DMT optimality of the efficient decoders is 
guaranteed, again for any lattice design and fading distribution. 
Conclusions are provided in Section IVIII 

C. Notation 

Z, M and C respectively denote the integer, the real and 
the complex numbers. M" and K™><" denote the set of n- 
dimensional and m x n-dimensional real vectors and matrices. 
Similar definitions apply to Z and C. Vectors and matrices are 
respectively denoted by lower- and upper-case bold letters, 
i.e., X and X. The identity matrix is denoted I and its size 
is made clear by the context. The all-zeros vector or matrix 
is denoted 0. , X^ and denotes the transpose, 

conjugate transpose and inverse of a matrix X. \\x\\ denotes 
the Euclidean norm of x, and HXjll the Frobenius norm 
of X. No notational difference is made between random 
variables (vectors and matrices) and their realizations. The 
multivariate real valued Gaussian distribution with zero mean 
and covariance I is denoted N{Q, I). 

II. System model 
A. The generic MIMO channel 

We consider a generic n x m (real) MIMO channel model 

y = Hx + w (1) 

where y £ R", H e R™^", x e M" and w e R™. The ti-ans- 
mitted codewords x are assumed to be uniformly distributed 
over some codebook X C R", and statistically independent 
of H. The noise is assumed to be i.i.d. Gaussian with unit 
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variance, i.e., w ^ J\f{0,I). Under these assumptions the 
optimal decoder, in the sense that it minimizes the probability 
of codeword error, is the ML decoder given by 



XML = argmin ||y - Hx\ 



(2) 



The channel H is assumed random (i.e., fading) with a 
distribution parameterized by a real parameter p > 0. The 
parameter p will throughout be interpreted as the SNR of the 
channel, although this is strictly speaking not required for the 
analysis. We assume that one use of ^ corresponds to T 
uses of some underlying "physical" channel, which motivates 
a definition of the rate in terms of bits per channel use (bpcu) 
according to 

i?=ilog2|A'| (3) 

where \X\ denotes the cardinality or size of X. The model 
in ([T]) is known to encompass many pertinent communication 
scenarios (c.f., [24]), and several explicit examples are pro- 
vided in Section |Vll The obtained results hold in the general 
setting unless otherwise explicitly stated. 

B. The diversity -multiplexing tradeoff 

Following [5] we refer to a family of codes, A'(p), param- 
eterized by p as a scheme and define the multiplexing gain r 
of the scheme according to 



lim 



R{p) 



lim 

P^oo log2 p p^oo T logp 

As we will be interested in the system behavior as a function 
of the multiplexing gain r, we will use the term design to 
denote a set of schemes over some range of r. In this sense 
we would consider the Alamouti code [1] or V-BLAST [3] 
with appropriately chosen constellations as designs (c.f., [5, 
Section VII]). We will in what follows write Xr to express 
the dependence of the codebook (or more appropriately the 
sequence of codebooks) on the r, while the dependence on 
p is suppressed for notational reasons. The diversity gain of 
the design under ML decoding is given, as a function of r, 
according to (c.f. [5]) 

log P {xml 7^ x) 



1 log\Xip)\ 



(4) 



dMhir) = - lim 



logp 



(5) 



(provided the limit exists) where x is assumed uniformly 
distributed over X^ and where Xml is given by ^ for 
X = Xj.. The expression in Q will in general define a tradeoff 
between the multiplexing gain and diversity gain, particular to 
the design and channel at hand [5]. 

As shown in [5, Lemma 5] the diversity gain duhir) is 
under the power constraint, E {|ja;|p} < T, upper bounded 
by the outage exponent c?out(''') where 



dout{r) = - lim 

p — >oo 



log P ( log det(7 + HH^) < 2RT) 
logp 



(6) 



In the case of the i.i.d. Rayleigh fading quasi-static MIMO 
channel (c.f.. Section [Vl-At , c?out(^) is given by the piece- 
wise linear curve connecting (fc, (n^ — fc)(?T-T ~ k)) for k — 
1, . . . , min(riT, J^r) [5]. Similar results have been obtained 



for other fading distributions [45]. A code is said to be 
approximately universal [12] for the particular system model 
at hand if diju^{r) = da^^{r) under any fading distribution. 
For the nx x "^r quasi-static MIMO channel, approximately 
universal codes have been constructed for all r, riR and nx 
provided T > nx [7], [8]. 

As frequently done in works on the DMT, we will make 
use of the = notation where f{p) = p^ iff (c.f., [5]) 



log/(p) 

lim — ~ X . 

p— >oo log p 



(7) 



The symbols > and < are defined similarly. In this notation 
a scheme has multiplexing gain r if \X\ = p''^ and diversity 
gain d under ML decoding if P {xml x) = p^'^. 

III. Lattice codes and decoding 

A. Lattice designs 

An n-dimensional real valued lattice A is the discrete 
additive subgroup of R" given by 

A={G2|^eZ"}. (8) 

The full rank matrix G G M"^" is referred to as the generator 
matrix of A. We shall throughout consider a class of designs 
given as follows. 

Definition 1 (Lattice design): A lattice design is defined by 
the pair (A, TV) where A C M" is a lattice and 7?. is a compact 
(i.e., closed and bounded) convex subset of R", which contains 
in its interior. For r > the sequence of lattice codes AV is 
given by AV = Ar n 7?, where A^ = 0rA and (t)r = p~'~ ■ 

As in [6], we refer to TZ as the shaping region of the lattice 
design. It is important to note that we assume that Ti. and A are 
fixed and independent of p and that, in general, TZ has to be 
appropriately chosen so that the design satisfy the given power 
constraint, e.g., E |||a;|p} < T. This definition of a lattice 
design is slightly more restrictive than the definition of lattice 
space-time codes considered in [6], in that we require the same 
lattice (and shaping region) to be used for all multiplexing 
gains r and SNR p. Note, however, that while we restrict 
the maximum value of ||a;|p by the shaping region, we are 
not restricting the analysis to short-term power constraints, as 
long-term power allocation policies may often be considered 
part of the effective channel H. 

It is straightforward to verify that the multiplexing gain 
of Xr is indeed r. By a principle, dating back to Gauss, 
stating that the number of lattice points in a large set is well 
approximated by the volume of the set, we have [46] 



\Xr\ =|A^^7^| |(/)^A^7^| 



p 



(9) 



V(Va) 

where V{TV) and V{V^) denotes the volume of the shaping 
region and the fundamental (Voronoi) cell of A respectively. 

The assumption that G is a square matrix can be made 
without loss of generality. To see this assume that G G R"^*^ 
where k < n and note that for a; e A we have x = Gz for 
z el''. Write G = UG' where J7 e R"^'= has orthogonal 
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columns and G' € R'=^'= is full rank, let H' = HU^ and 
x' = G'z. We obtain Hx = HGz = H'U^UG'z ^ 
H'G'z = H'x', i.e., transmitting x over H is equivalent to 
transmitting x' over H' . As no explicit assumption is made 
regarding the fading distribution of H, we may equivalently 
consider the channel given by H', and use the square generator 
matrix G' in the formulation of the lattice design. The two 
equivalent cases naturally result in the same DMT curve. On 
the other hand, if fc > n we may extend G E M."^'' to a fc x fc 
full rank matrix by the addition of A; — n linearly independent 
rows, while adding k—n columns containing zeros to H in the 
corresponding positions, thus leaving the input-output relation 
of ([U unaltered. 

The definition of a lattice design admits most of the codes 
mentioned in Section II- A II in a straightforward manner in the 
sense that the code construction may be completely described 
by the pair (A,TZ). The largest subclass of lattice codes, 
generally known as linear dispersion codes (c.f., [4] and [6]), 
additionally satisfy x = (f>r ^"^^(0^0;; + b^/Jj) for some fixed 
Oj, bi e M", i ~ 1, . . . ,n/2, where and constitute the 
real and imaginary part of a complex constituent data symbol 
chosen from a suitable constellation, e.g., a QAM or HEX 
[47] constellation. The structure of the linear dispersion codes 
provides efficient encoding, and naturally yields a shaping 
region TZ in the form of an orthotope with axes aligned 
with the columns of the corresponding generator matrix G = 
[oi, 61, . . . , an/2, bn/2]- The class of linear dispersion codes 
include the constructions in [7]-[ll], [13], [14], [18]-[22] 
as well as many classical designs [l]-[3]. Also the codes 
with reduced decoding complexity in [38]-[42] belong to this 
class of codes. It is known that a better shaping gain may be 
achieved through a more careful design of the shaping region 
n (c.f., [6], [15]) 

Before continuing, two remarks are in order While [15] 
defines single lattices which provide strong lattice codes, the 
specific encoding strategy proposed in [15] will in general also 
introduce a (pseudo-random) translate of the lattice A^. This 
is not covered by our basic definition of lattice designs which 
specifies the code exclusively in terms of A and TZ. Although 
the results presented in the following straightforwardly extend 
to cover such lattice translates, we shall in the interest of 
notational simplicity not consider this at first. Instead, we 
outline the changes required to handle this generalization in 
Section [V] Furthermore we remark that we make no assump- 
tions regarding the optimality of the code design itself, i.e., 
we do not assume that duLir) = doutl?"), and consequently 
the results are applicable also to suboptimal designs such as, 
e.g., V-BLAST. 

B. Lattice decoding 

The ML decoder in (|2]i implements a search for the code- 
word closest to y over A",. = KrfMl [23], [24]. As in [6], [24] 
we use the term lattice decoding to refer to an unconstrained 
search over A^, i.e., a search where the constraint imposed 
by TZ is ignored by the decoder. The rationale behind such an 
approach is that it symmetrizes the problem and allows for the 
structure of the lattice to be exploited in order to reduce the 
computational complexity of the decoder [23]-[25]. 



The naive lattice decoder (c.f. [6]) is obtained by simply 
removing the constraint imposed by TZ in the ML decoder 
while keeping the decision metric unaltered, i.e., 

Anl = arg mill ||y - iJ&lp . (10) 

In the event that ^nl ^ '^r the decoder declares an error. It 
is known that the performance loss incurred by neglecting the 
codebook boundary TZ may in this case be substantial, and 
that the naive lattice decoder is not DMT optimal in general 
[6], [33]. Still, as proved in [6] for the i.i.d. Rayleigh fading 
quasi-static MIMO channel, the problem does not lie with 
lattice decoding per se, but with the naive implementation. 
In particular, after an appropriate alteration of the decoding 
metric, it was by a random coding argument shown that lattice 
coding and decoding is sufficient for achieving optimal DMT 
performance in this scenario [6]. 

Intuitively, as the naive lattice decoder (fTOl i is suboptimal 
in terms of its diversity, it must mean that ^nl 7^ TZ with a 
probability that is large in relation to P (^ml 7^ x), i.e., the 
decoder is relatively likely to decide in favor of a codeword 
outside the region defined by TZ. As TZ is bounded it is 
plausible that a regularization [48] of the decoding metric 
may reduce the probability of "out of region" error events, 
and improve the probability of error. 

C. DMT optimality of regularized lattice decoding 
The (general) regularized lattice decoder is given by 

x-L = arg min lly — Hx\\^ + (11) 

xeA,, 

where 11 Tx for some given positive definite matrix 

T — T^. The additive term applies an incremental 

penalization to lattice points further from the origin, and 
reduces the probability of error associated with codewords 
outside of the shaping region. This notion is formalized by 
the following theorem, which constitutes one of the main 
contributions of this work, and states that ( fTTT ) is a DMT 
optimal decoding strategy for lattice designs, in a remarkably 
general sense. The proof is given in Section IIII-DI 

Theorem 1: For any lattice design (A, 7?.), and for any 
fading distribution such that (iML(?') is (right) continuous at 
7-, the regularized lattice decoder is DMT optimal, i.e., 

^lW = dML(r), (12) 

where 

,.xA 1. logP(i:L ^ x) 
dh{r) = - lim , (13) 

p^oo log p 

for X uniformly distributed over Xr, and given by ( fTTT l. 

Before proving Theorem [T] we remark that for T = I the 
regularized decoder is equivalent to the MMSE-GDFE decoder 
considered in [6], if we neglect the lattice translate considered 
therein. In particular, the regularized lattice decoder in (fTTT l is 
equivalently given by (c.f.. Appendix [All 

= arg min - Sill^ (14) 

where F e M"""™ and S e K"^" are MMSE-GDFE forward 
and feedback filters [6]. This equivalence is interesting in light 
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of the fact that the motivation of the MMSE-GDFE decoder 
in [6] was largely information theoretic in nature, while the 
regularization view is arguably of a more signal processing 
flavor. Theorem [T| thus extends the results of [6] and proves 
DMT optimality of MMSE-GDFE decoding for any lattice 
designs based on a single, fixed, generator matrix. We also note 
that although the specific matrix T in ( fTTT i has no effect on 
the diversity gain (provided T is full rank) it may significantly 
affect the coding gain and should in practice be chosen based 
on the shaping region, code, and channel statistics. 

D. Proof of Theorem Q] 

We begin by providing the following lemma, proven in 
Appendix |B] The purpose of the lemma is to connect the 
probably of ML error with the existence of a small codeword 
difference \\H{xi~X2)\\'^ where Xi and X2 belong to a subset 
of the codebook. In essence, the lemma provides a "deep fade 
typical error" probability bound in line with [28, Ch. 3]. 

Lemma 1: Let B be the spherical region given by 

B={d^W'\\\d\\^ <-f} 



where the radius 7 > (independent of p) is chosen such that 
di + d2 £ TZ for any di , rf2 G B. Let 



A 

= 



deBnAr-.d^O 

Then, for any r > it holds that 



^WHdW 



limsup ^— ^ < —aML(r)- 



(16) 



(17) 



>6>0. 



(18) 



p^oo log p 

The existence of the set B in ( fTSl l follows by the assumption 
that is contained in the interior of TZ. Now, let ^ > be 
given and choose S > such that 

2(T 
n 

This may clearly be done for arbitrary C > 0. We will in 
the following assume that Vr+c, ^ 1 that UtolP < p^ , 
and prove that these two conditions are sufficient for a correct 
decision by the regularized lattice decoder in ( fTTT i. provided 
that p is sufficiently large. Hence, in order for an error to occur 
at large p, one of the assumptions must fail. 

To this end, consider first the metric in ( fTTI ) for the trans- 
mitted codeword x, i.e.. 



\y - Hx\ 



< 



(19) 



where y — Hx = w and 1 1 to | P < p was used, and where 

c = max llrll^ . 

rev. 

Note that c < 00 as TZ is bounded and that c is independent 
of the transmitted codeword x and p. 

In order to bound the metric for x e A,, where x ^ x, we 
note that I'r+c > 1 implies 

\\\Hdf>l ydeBr\Ar+c,d^O, (20) 
by the definition in (fTSI l. As A,. = p^A,.+^ it follows that 



\Hd\\'^>p^ yde p—BnAr, d^O 



(21) 



after scaling ( |20] i by p - . As 7^ is bounded, and as ^ > 0, it 
holds that TZ C ip~B for all p > pi, given some sufficiently 
large pi. This implies that x e ^p~B for p > pi since 
X G TZ. It is important to note here that while pi may depend 
on ( and TZ, it can be chosen independent of the particular x 
transmitted. 

For any x E ^p~ BOAr, x ^ x, it holds that d = x — x E 



'B n Ar. By dm we have 



(22) 



'j\\Hix-x)r = \\\Hdr>p^ 

where d = x — x. As \\w\\^ < p^ it follows by (|22] | and 
( fTSb that > 1 1 If IP for large p. In particular, there is 

some p2 > Pi, independent of x and x, for which the triangle 
inequality implies that 

\\y - HxW^ = \\H{x -x) + it)|p > p^ 



for all p > P2. Consequently, 



\\y-Hxf + \\xfj,>p— 
(15) for any x E Ar where x E ^p^B and p > p2. 



(23) 



In the case that x ^ ^p^^ B, it follows by the defini- 



|7P " which implies > 
j"f ^-miniT) p^ where Ainin(T') > denotes the minimum 
eigenvalue of T. It follows that 



tion in ([HI) that > j- 

2CT 



|y-i?a;||' + ||i;||^> i7A„,i„(T)p 



2CT 



(24) 



for any x ^ p B. 
Let 



a{p)^p^ + c and b{p)^min{l,ljX^in{T))p^ (25) 

and note that ( fTSl ) implies that there is some pa > p2, again 
independent of x and x, for which a{p) < b{p) for all p > p^. 
For the transmitted codeword x we have by ( fT9] l that 



\\y- 



Hxf- 



< a{p) . 



For any other x E Ar (i.e., x E Ar\{x}) it holds by (|23]) and 
(|24] | that 



y - Hx\ 



> b{p) > a{p) 



(26) 



for all p > P3. This implies that the transmitted codeword 
yields the minimum metric in (fTTT i. or equivalently that = 
X as long as p > p3 and under the assumptions that I'r+c ^ 1 
and 1 1 If IP < p^ . For an error to occur when p > pa it is thus 
required that i^r+c < 1 or \\w\\ > p^ . 

Applying the union bound to the probability of error yields 



P (*L ^ a;) < P K+c < 1) + P {\\w\\ > p^ 



(27) 



for p > p3. As P {\\w\\ > p'') = p~°°, due to the exponential 
tail of the Gaussian distribution, the second term in ( |27l ) 
is asymptotically irrelevant. By Lemma [T] it follows that 
P {i^r+c < 1) < p"''"L(r+C)^ j^Qfg }jgj.g ^jgQ Lemma [T] 
is applicable even when r ~ since it is applied at a 
multiplexing gain of r + ^ > 0. It follows that 

logP (Al ^ x) 



lim sup ■ 

p — >oo 



logp 



< -duhir + ■ 



(28) 
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By observing that ( 128] ) holds for an arbitrary choice of ^ > 0, 
we may conclude that 

logP (Al 7^ x) 



lim sup ■ , 

p^oo log p 

for any r > 0, provided that 



(29) 



lim duhir + C) = duhir) , 

C^o+ 

i.e., provided duhir) is right continuous at r. As 
P (Al 7^ a^) > P {xml 7^ x) due to the optimality of the ML 
decoder it holds that 

^^_^^\ogF{x^^x) 



lim inf ■ 

p—tco log p 



> -duhir) . 



which combined with ( |29] | establish the claim of Theorem [T] 

E. A geometric example 

In order to provide further intuition into the suboptimality 
of the naive lattice decoder, and the argument made in Sec- 
tion UlLni it is useful to consider the example provided in Fig. 
[T] where is a scaled version of the integer lattice and 
where the shaping region TZ is spherical. The image of A^ and 
TZ under the linear map induced by H are shown in Fig. |l(b)| 
In the example, H e M^^^ is nearly rank deficient. For the 
illustration, <Ti(-Fi') = 40a2(-Ff) where ai{H) denotes the ith 
singular value of H. 

We will in the following discussion assume that x = 
corresponds to the transmitted codeword and, for simplicity, 
that T = /. As seen in Fig. |l(b)| no other codeword 
X G Xr\{x} is mapped close to Hx by the linear map H. 
Thus, the ML decoder is unlikely to make an error. However, 
when considering decoding to the full lattice A^, the (naive) 
lattice decoder is likely to decide in favor of the, in Fig. |l(a)[ 
indicated codeword x e Ap. This is a consequence of the fact 
that X lies close to the space spanned by the right singular 
vector corresponding to the smallest singular value of H (c.f. 
Fig. |l(a)| i. The closeness of Hx to Hx illustrates the problem 
with the naive lattice decoder, i.e., even when no codewords 
in Xr lie close to the space corresponding to a weak singular 
value of H it may be likely that a "hypothetical" codeword in 
Ar does. This view is strengthened by the observation that the 
performance of the naive lattice decoder is often determined 
by the statistics of the channel's weakest eigenmode (c.f. [6], 
[33]), although the fixed-rate V-BLAST result in [32] provides 
an exception to this rule. 

The intuitive argument behind the regularization is that 
any lattice point x (far) outside the constellation region TZ, 
which implies that is large, is significantly penalized 

by the regularized decision metric. For codewords x ^ x 
in TZ the first quadratic term in (fTTT l will be large, unless 
the ML decoder is also likely to be in error. Although this 
heuristic argument fails for codewords x close to the boundary 
of TZ, this problem may be circumvented under the continuity 
assumption of Theorem [T] by considering a larger constellation 
region, corresponding to the codebooks used at a marginally 
higher multiplexing gain. 

The effect of the regularization can also be seen in Fig. |l(c)| 
that shows the image of A^ under the linear transformation of 



the MMSE-GDFE feedback filter B in ^T^, coiTesponding to 
a regularized version of H. For the purpose of the illustration, 
we have chosen B so that is shares left and right singular 
vectors with H. While the image of codewords inside TZ under 
the transformations H and B are relatively similar (c.f.. Fig. 
|l(b)| and |l(c)| i, codewords outside the constellation TZ are more 
affected by the change from H to B. Note in particular the 
difference between y — Hx and f = Bx in Fig. l(b)| and 
|l(c)| Decoding to the closest lattice point in Fig. |l(c) is in this 
case clearly a better approximation of the ML decoder than 
decoding to the closest lattice point in Fig. |l(b)| 

IV. Computationally Efficient Decoding 

A. DMT optimality of approximate lattice decoding 

Obtaining in (fTTT l still requires the minimization of 
a quadratic function over the discrete lattice A^, a problem 
which is NP-hard in general, even after pre-processing [43]. 
This implies that even if lattice reduction techniques are used 
when obtaining the exact solution to (fTTT l. it is unlikely that 
there will be any general techniques with a (worst-case) com- 
plexity that grows sub-exponentially in the problem dimension 
n, unless the code itself provides a structure that simplifies 
decoding, such as for example in the case of orthogonal 
designs [1], [2]. For most high-performance lattice codes no 
such efficient solutions to (fTTT l are known, which motivates the 
study of suboptimal implementations of the regularized lattice 
decoder 

The codeword is by definition the codeword which pro- 
vides the minimum metric in (fTTT l. A C -approximate solution 
to (fTTT l is any a; G A^ which for C > 1 satisfies 



^(a;)<Ce(&L) where ^{x) = \\y- Hxf + \\x\\ 



(30) 



An algorithm that for fixed C is capable of producing a C- 
approximate solution to ( fTTT l. for arbitrary inputs y e K™ and 
H e M™^", is referred to as a C -approximation algorithm 
[49]. In what follows we prove that any C-approximation 
algorithm for (fTTT l is sufficient for DMT optimal decoding in 
the sense of Theorem [T] 

Theorem 2: For any lattice design {K,TZ), and fading dis- 
tribution such that dMi^ir) is (right) continuous at r, all C- 
approximate implementations of the regularized lattice decoder 
are DMT optimal provided C is independent of p, i.e.. 



dA{r) 



where 



dA{r)^ 



lim 

p — >oo 



logP (Aa ^ x) 
logp 



(31) 



(32) 



for X uniformly distributed over X^, and where x^ is any 
C-approximate solution to ( fTTT l. 

Proof: The proof follows from the proof of Theorem [1] 
provided in Section fTlI-DI In particular, consider a{p) and b{p) 
defined in ( f25T l. By the assumption in ( fTsT i it follows that 

lim —7^ = 00 . 

p^oo a[p) 

We may thus select p^^ > p-^ such that h{p) > Ca{p) for all 
p> P4. As the metric for the transmitted codeword x is upper 
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(a) Original lattice Ar and shaping region Ti. 



(b) Image of A,, and 7?. under lineal' map H 



(c) Image of Ar and 7?. under linear map B 



Fig. 1. Transformation of lattice Ar and spherical shaping region 7^ under linear map induced by channel H and MMSE filter B. Singular vectors of 
H= C/SV'^, where U = (1*1,112) and V= (yx,V2), are shown as solid lines for reference. The matrix B is such that B^ B = / + H where B 
shares left and right singular vectors with H. Further, y = Hx, y = Hx, r = Bx, and r = Bx. 



bounded by a{p), and the metric of any other codeword is 
lower bounded by b{p), it follows that when p > p^, the only 
C-approximate solution to ( fTTT i is x, i.e., = x for p > p4, 
under the assumptions that i^r+c < 1 and ||to|p < p*. The 
remaining proof is then analogous to the proof of Theorem [T] 
in Section HIFPl □ 



B. DMT optimality of LR-aided lattice decoding 

The existence of computationally efficient C-approximate 
solutions is thus of interest for DMT optimal decoding of 
lattice designs. Fortunately, such solutions are already known, 
both with respect to (fTTT l. or to the equivalent MMSE-GDFE 
formulation in (fl4] i. In fact, as shown in Appendix |A] any C- 
approximate solution to (fl4] i is also a C-approximate solution 
to ( fTTT l. Of special interest in the communications context is 
Babai's nearest plane algorithm [50], which is equivalent to 
the LLL-based [51] LR-aided SIC solution to ^ [25], [30], 
[31], [50]. The nearest plane algorithm provides a computa- 
tionally efficient Ci -approximate solution (fT4] i with 
[50]. Similarly, the LLL-based LR-aided linear solution to 
(O, discussed in [50] as the rounding algorithm, provides 
a C2-approximate solution whith C2 = 1 + 2n(9/2)"2. For 
completeness, we give the following corollary to Theorem |2] 

Corollary 2a: The efficient LLL-based LR-aided linear (or 
SIC) implementations of the regularized lattice decoders pro- 
vide DMT optimal decoding of any lattice design under the 
assumptions made in Theorem [T] and |2] 

Proof: The corollary follows by the equivalence of the LR- 
aided linear decoder and the rounding algorithm in [50], or of 
the LR-aided SIC decoder and the nearest plane algorithm in 
[50], in conjunction with Theorem |2] □ 

Corollary |2a] applies directly to the LR-aided linear im- 
plementation of the MMSE-GDFE decoder [24], [25], [34], 
due to the equivalence of the MMSE-GDFE decoder and the 
regularized decoder as outlined in Appendix |A] The corollary 
applies also to the LR-aided MMSE-SIC decoder proposed in 



[35], when applied to the equivalent channel 

y — HGs + w 

where s E (f>r'^". Note however that in the latter case we 
would have T — (G^G)^^, as opposed to T = J, reflecting 
a regularization of s rather than x ~ Gs. In the case of perfect 
codes [13], where G = I, the metric of the MMSE-GDFE and 
the MMSE-SIC decoder coincides. 

Corollary |2a] applies also to a time-limited implementation 
of the Schnorr-Euchner (SE) sphere decoder [23], [52] op- 
erating in the LLL reduced regularized lattice, provided the 
sphere decoder tree-search is allowed to reach the first leaf- 
node. This follows as the first leaf-node found by the SE 
SD corresponds to the Babai-point, i.e., the solution obtained 
by the nearest plane algorithm (c.f., [23]). Finding further 
candidate codewords with smaller metric can only improve 
the approximation ratio. 

C. Decoding complexity 

Both the LR-aided SIC and linear decoders discussed above 
begin by LLL reducing the lattice generated by M — BG, 
where G is the generator matrix of A and where B is the 
MMSE feedback filter (c.f., [6] and Appendix [All, followed by 
a SIC or linear decoding stage in the reduced basis. Note here 
that by the regularization of B the matrix M is always full 
rank which makes the LLL algorithm applicable, regardless 
of the channel realization and the system dimensionality. The 
complexity of the decoding stage is only 0{'n?) [30], [31], [35] 
while the pre-processing relying on the LLL reduction is more 
complex. It is therefore relevant to consider the complexity of 
the LLL algorithm when applied to M in order to address 
the complexity of DMT optimal decoding of lattice designs. 
We refer the reader to [30], [31], [35] for the implementation 
details of LR-aided decoders. 

The LLL algorithm provides an iterative approach to lattice 
reduction [51]. The number K of LLL iterations required to 
reduce a given lattice generator matrix M e M"^" may be 
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bounded according to [44], [53] 

K <n'^ log, k(M) + n (33) 

where s = 2/\/3 and where k{M) denotes the 2-norm con- 
dition number of M. Each iteration requires 0{n^) floating 
point operations [51]. The number of operations per iteration 
may, however, be reduced to 0{n) if only an effectively LLL- 
reduced basis is required, as is the case when a SIC decoder 
is applied in the reduced basis [54]. 

It is important here to note that for arbitrary M e M"^" 
there is no universal upper bound on the number of iterations 
required to reduce M [44]. Thus, the worst-case complexity 
of the LLL-based LR-aided decoder is unbounded if applied 
to arbitrary channels. However, in order to achieve DMT 
optimal performance it is not required to LLL reduce every 
conceivable channel. To see this, consider a decoder imple- 
mentation which is allowed to time-out, and declare an error, 
when the number of floating point operations exceeds a given 
threshold. Denote the time-out event T, and note that as 
long as P (T) < p"''"'"^''' the time-limitation imposed will 
not reduce the diversity gain, or potential DMT optimality, 
of the decoder. In light of (|33] ) we may thus limit the 
application of the LLL algorithm to bases M = BG with 
bounded condition number k{M), or allow the decoder the 
option to time out, stop, and declare an error. In order to 
be able to provide an effective statement regarding the worst 
case decoding complexity under time-outs, we impose here a 
moderate restriction on the channels considered. 

We say that a channel is power limited if E {||i?|||} <p 
and note that this is required whenever we wish to interpret 
the parameter p as an average SNR at the receiver. For the 
class of power limited channels we may make the following 
statement, proven in Appendix ICl 

Lemma 2: For any power limited channel there is some 
constant a > where for M = BG it holds that 

provided duhir) < oo. 

By applying Lemma |2] (l33T l and Corollary |2al together with 
the previous discussion, the following statement regarding the 
complexity of DMT optimal decoding can thus be made. Note 
here that the signal space dimension n is considered fixed and 
is thus hidden in the big-0 expression. 

Theorem 3: For power limited channels, over any range 
of multiplexing gains r where duhir) is continuous, DMT 
optimal decoding of any lattice design is feasible at a worst- 
case complexity of 0{\ogp). 

Proof: The theorem follows by imposing the constraint 
k{M) < in (l33T l. where a is chosen according to Lemma|2] 
and noting that such a restriction in the set of channels to 
which the decoder is applied does not reduce the diversity. □ 

Although the bound in Theorem |3] implies an increase in 
the LLL LR complexity for increasing SNR, this complexity 
only grows linearly in log p. By comparing to ^ and (|9]l 
it may be seen that this corresponds to a linear increase in 
complexity as a function of the rate R at high SNR. The 



LLL complexity should also be put in context with the full 
search implementation of the ML decoder whose complexity 
is \Xj.\ and thus exponential in R. This also applies to sphere 
decoding implementations where the worst-case complexity 
reported (see for example [38] for fast decodable codes [39]- 
[41]) is also exponential in R, albeit with a smaller exponent 
than the full search. The same holds true for the hybrid 
transceiver in [42] (given jit > 2). All such lattice-based 
designs may, however, be DMT optimally decoded using an 
LR-aided regularized lattice decoder structure with 0{\ogp) 
complexity, potentially at some loss in coding gain, but at no 
diversity loss. 

Finally, we note that in the case where di^{r) — oo the 
statement in (|34| | in Lemma |2] cannot be guaranteed based 
on the condition that E|||i?|p}</3 alone. However, for 
any channel statistics under which P > p") = p^°° 

for some sufficiently large a, Theorem [3] still applies. This 
includes for instance the quasi-static MIMO channel (c.f.. 
Section [VI- Ab under i.i.d. Rayleigh fading, or any other fading 
distribution with exponential tails. 



D. The search for improved approximation algorithms 

It is in the context of C-approximation algorithms important 
to note that while DMT optimality follows for any finite C, 
the gap in terms of SNR to the optimal implementation of ([TTl l 
will in general depend on C. Thus, the loss in performance 
at practical SNR may be unacceptable for unduly large values 
of C. This motivates further study into new approximation 
algorithms, and code designs, that jointly yield improved 
approximation ratios. 

Such methods may include stronger LR methods such as the 
deep insertion LLL variant [52] that is more computationally 
expensive but which finds better bases. Other LR approaches 
include methods based on the Korkine-Zolotareff bases (c.f., 
[23]), and the algebraic lattice reduction approach in [55]. The 
latter method was presented for the 2x2 golden code [56] over 
the quasi-static MIMO channel, and approximates the channel 
matrix with the matrix representation of an invertible element 
of the maximal order of the CDA. Codes in which the ML 
decoder may be applied to spaces of reduced dimensionality 
(c.f., [36], [37], as well as [38]-[41]) may benefit from a re- 
duced gap between ML and lattice decoding due to the general 
dependence of the approximation constant C and the lattice 
dimension. This would suggest the use of transceivers based on 
reduced-dimensionality codes and regularized lattice decoding, 
as a good way to further approach ML error performance with 
a reduced SNR penalty. The topic of C-approximate solutions 
is, however, in the context of space-time decoding relatively 
unexplored at this stage. 



V. Generalizations 

In this section we consider a few straightforward general- 
izations in terms of the class of designs covered by the results 
as well as the modeling assumptions imposed in Section HIl 
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A. Nested lattice designs 

In the proof of Theorem [T] and in the lattice designs of 
Section IIII-AI we assume a fixed shaping region TZ, appHed 
for all p. This condition could, however, be relaxed in favor of 
a sequence of shaping regions Tl{p), such that ]Z C Ti-ip) C TZ 
for sufficiently large p where TZ and TZ are fixed "inner" 
and "outer" shaping regions that satisfy the conditions in 
Section lTlI-AI Such an extension could be of interest for nested 
lattice codes [57] involving a shaping lattice satisfying 
A^ C Ar where TZ is the Voronoi region of A^, i.e., TZ — V\ 
[6], [15], [57]. One option along this line is to let A^ = Ur-Ar 
where € N is an appropriately selected integer (i.e., self- 
similar nesting [6]). This will in general require TZ to weakly 
depend on p, if we wish the code to be properly defined for all 
r and p. Alternatively, self-similar nested designs could also be 

— rT 

accommodated by replacing the assumption that 0.^ p~^ 
by the relaxed assumption (pr = e.g., = \p^\ 

where [-J denotes rounding to the nearest integer The proof 
given in Section ITlI-DI straightforwardly extends to cover these 
cases, at the expense of somewhat more cumbersome notation. 

B. Random lattice translates (dithering) 

In [6], [15] a random lattice translate, or dither, known to 
both transmitter and receiver was included in the lattice code 
design. The inclusion of a properly chosen random lattice 
translate builds upon a construction in [57] and tends to sim- 
plify the analysis of MMSE receivers by making the MMSE 
estimation error independent of the transmitted codeword. 

In the setup considered herein we may include such a 
lattice translate by considering codebooks of the form Xr = 
{Ar + u) O TZ where u is the random lattice translate, 
possibly dependent on p and r. This construction allows for 
the inclusion of the "mod- A" nested lattice codes considered 
in [6], [15]. Note, however, that the specific way in which 
the mod-A construction in [6] maps information messages 
to codewords, although important from an implementational 
point of view, is irrelevant to the analysis presented herein as 
we only consider decoding and not encoding. 

The proofs of Theorem [T] and Lemma [T] only need to change 
in that x,x E Ar + u replace x.x E A,, in order to establish 
DMT optimality of the regularized lattice decoder given by 

= arg mill ||y — JJa;|p + ||a;|j|. . 

xeAr+u 

In particular, the bound in ( fT9b holds as is, the bound in ( |23] | 
applies to any x e ^p~Br] {Ar + u), and (l24l) applies to any 
X ^ ^p~B as before. It follows that regularized lattice decod- 
ing is DMT optimal also for designs which include arbitrary 
chosen random or non-random lattice translates. However, it 
also follows that no such lattice translate is required for DMT 
optimality. Still, as argued in [6], [15], inclusion of a lattice 
translate could symmetrize the code, and potentially improve 
the characteristics of the code at finite SNR. 

C. Noise generalizations 

It is valuable to point out that Theorem [T] is only weakly 
dependent on the nature of the additive noise. In fact, the 



only parts of the proof that explicitly depend on the Gaussian 
assumption, is in the lower bound on the pairwise error 
probability (PEP) in ( l46l l and where it is concluded that 
P (llti'll^ > p^) == in Section UlI-DI Thus, for any noise 
statistics under which P (HiijII^ > p^) = p^°° and where 
we may assume a non-zero lower bound on the PEP as in 
( l46l l. the regularized decoder may be shown to at least match 
the diversity of the (mismatched) ML decoder in ([SJ, i.e., 
dh{r) > duhir)- In the case of correlated Gaussian noise, the 
model in ([T]l is generally directly applicable after absorbing a 
noise whitening filter into the channel matrix. 

The noise generalization also proves useful when the noise 
component in ([T]) contains self interference, i.e., w = Ex + v 
for some stochastic E e M™'^" and noise v. This encompasses 
the partially coherent scenario when the receiver only knows 
the channel approximately, in which case E would model the 
channel estimation error Under the assumption that \\E\\p is 
independent of x and p, which is typically the case when 
the channel is estimated using pilots of power proportional to 
the transmit signal power, and when P (||-E||| > p*) ^ p^°° 
the previous results apply, in spite of the fact that the noise 
is no longer independent of the transmit signal. In particular, 
the lower bound of the PEP in ( |46] | applies straightforwardly 
by the additive noise alone, and P > p^) = p^°° 

follows by the tail assumption on ||-E|||. We also note that 
the argument in Section IIII-DI does not rely on independence 
between x and w. Thus, the regularized lattice decoder is 
provably good also in some scenarios involving non-perfect 
channel state information (CSI) at the receiver. 

D. Lower bounds on the diversity 

Finally, consider an arbitrary, continuous, lower bound on 
the diversity of the ML decoder, i.e., dM'Lir) > ^^ml('')- 
clear that ( [TtI i holds with ^mlI^) place of dMi^ir)- Thus, 
(|28] | and (|29] | also holds with dyii^{r) in place of duijir) 
and it follows that di^{r) > d]^^i^{r), i.e., that same lower 
bound applies to the regularized lattice decoder Naturally, this 
observation may be of interest in scenarios where the diversity 
of the ML decoder is discontinuous and/or not explicitly 
known. 

An important special case is where duLir) = oo over some 
open interval of r. The application of a sequence of continuous 
lower bounds may be used to establish that di^{r) = oo over 
the same interval. Of special interest here is the scenario when 
lattice decoding of an approximately universal lattice code 
(e.g., [8], [15]) is restricted to channels not in outage, in which 
case it follows that di^{r) — duhir) = oo. A direct application 
of this result is given in Section IVI-DI 

VI. Examples 

We proceed by providing a few example scenarios to which 
the results developed in the previous section are applicable. 
The examples in Section |VI-A| , |VI-^ and [VLCl are straightfor- 
ward in the sense that they simply establish a distribution for 
H in ([T]i, to which Theorems[ri|2]and[3]are directly applicable. 
The example in Section IVI-DI is. however, more involved. 
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A. The quasi-static MIMO channel 

The riT-transmit riR-receive antenna quasi-static (flat- 
fading) MIMO channel commonly given by (c.f. [6]) 

V'i = ^Wx'i + w'i, t=l,...,T (35) 



where H'^ G C""^"'^ has some distribution independent of 
p, where x'l £ U'-^, G C"^, and € C"«, and where 
t denotes a time index. The channel may be rewritten in the 

T 

1 ^ ■ 



form of ([T]) where x ~ \x'^ , . . . , a;^l^ with 



and where 5R(-) and denotes the real and imaginary part 
respectively, w — [wj , . . . ,w 



TlT 



with 



and 



(36) 



The channel in ( [35] l is also often written in an equivalent 
matrix form 

Y" = ^H'X'' + W'' (37) 



where X'^ = [x^, . . . ,x^] and = [w^, . 
the short-term average input power constraint 

X 



, w^]. Under 



(38) 



and an appropriate scaling of Jf*^, the parameter p takes on 
the interpretation of an average signal-to-noise ratio (SNR) per 
receive antenna (c.f. [5], [6]). 

B. The parallel MIMO channel (MIMO-OFDM) 

A natural extension of the quasi-static MIMO channel is the 
fiT X riR parallel, or MIMO-OFDM, channel. In this setting 



w 



I 1 



(39) 



where X'{ = [xf . . . ,xfrp] £ i^^nTxT (jgjjotes the complex 
space-time block codeword transmitted over the ^th sub- 
channel in the T time-slots, and where H'( £ C"^^"'^ is 
the channel matrix for the Ith sub-channel. Similar to the 
flat fading quasi-static channel, it is clear by the linearity 
of (|39] | that the parallel channel can be rewritten according 
to ([T]i- Coding across the parallel channels is achieved by 
the appropriate choice of generator matrix G. For the rate 
definition it is conventional to consider one use of ( [39l ) as LT 
channel uses. 

Naturally, the DMT characteristics of the parallel channel 
depend of the statistics of [Hi, . . . , Hi]. In the particular 
case where iJ ^ for I = 1 , . . . , L represent the OFDM tones 
for a Q-tap i.i.d. Rayleigh fading channel, i.e. 



Hf 



where H^ £ 



= E Hie' 

q=0 



■i27rq- 



1 = 1,. ..,L, 



, q = 0, . . . ,Q — 1, are stochastically 
independent i.i.d. Rayleigh fading taps in the time domain, the 
maximal diversity gain is /q(7') where fQ{r) is given by the 



piecewise linear curve connecting {k, {Qn — k){n~ kj) for 
k = 1, . . . ,n where n = max(nR, rtx) and n = min(nR, rtx) 
respectively [58]. Generalizations of this result, to more com- 
plicated scenarios, are found in [59]. 

Lattice designs for which duhir) = fqir) for all r £ [0,n\ 
were given in [19], [60] for particular values of nx and L and 
in [61] for the general case of nT,L. Due to the continuity 
of fQ{r) we may conclude that low complexity and DMT 
optimal decoding of these codes is possible, i.e., there exist 
computationally efficient explicit and DMT optimal transceiver 
designs for the parallel MIMO channel. The results extend to 
any statistics under which dMhir) is continuous. 

C. The amplify -and- forward relay channel 

Over the amplify-and-forward (AF) relay channel, one or 
several relays amplify and retransmit the signal received in 
previous time-slots, in order to aid the transmission of data 
from a source to destination. An initial, orthogonal, version 
of this scenario was in the DMT context studied in [62]. As 
an example, we here consider another AF protocol, namely 
the single-antenna single relay non-orthogonal amplify and 
forward (NAF) protocol proposed in [63], operating over a 
quasi-static channel. We omit constant transmit power scaling 
factors for brevity. One transmission from the source followed 
by a joint source relay transmission may be modeled according 
to (c.f. [64]) 



Vt 



phh'^h'^ 












x1 + 


y/pbh^ 



(40) 



where h^, and are the complex gains from source to 
destination, source to relay, and relay to destination respec- 
tively. The term represents the receiver noise at the relay 
and the noise at the destination. The relay amplification b 
is in general allowed to depend on p and and must satisfy 



< 



1 



(41) 



in order to meet the relay transmit power constraint. After 
noise whitening ( |40] | becomes equivalent to ( |35] | with 



H'^ 







(42) 



where one transmission over ( |42] | corresponds to two channel 
uses in the definition of the rate. As argued in [18], any 
approximately universal code designed for the 2x2 quasi- 
static MIMO channel is able to achieve a diversity gain of 
d]y[L(r) = (1 — r) + (1 - 2r)+, under AWGN noise and i.i.d. 
Rayleigh fading assumptions, provided h is properly selected. 
This also corresponds to the maximal diversity over the class 
of linear AF protocols [64]. We see that the AF protocol 
defines a (somewhat complicated) set of channel statistics, 
parameterized by p. It follows directly by the continuity of 
dyii^ij) fliat di^{r) = dyyi^{r) over r £ [0, 1]. 

There are several generalizations of AF protocols to more 
relays and different relay actions, [20], [64], [65]. General 
to this setting is that the particular AF protocol determines 
the statistics of the equivalent channel in ([T]i, similar to ( |42] |. 
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Lattice designs for some of these generalizations are found in 
[18], [20]. The application of Theorem[Tl|2]and[3]is straightfor- 
ward to most, if not all, lattice designs in these settings, once 
the AF protocol is established. Note that the lattice designs 
in [18], [20] provide for approximate universality over these 
system models, and as a result, di^{r) is optimal in these 
settings. Note also that even for scenarios where duhir) is 
not known, it follows from the discussion in Section [V-DI that 
any continuous lower bound on duhir) applies also to dL{r). 

D. The L-round MIMO-ARQ channel 

Consider the L-round MIMO ARQ setting where, as in 
[66], signaling of the information across the x tir quasi- 
static MIMO channel uses an i-round automatic retransmis- 
sion request (ARQ) protocol that assumes the presence of a 
noiseless feedback channel conveying one bit of information 
per use of the feedback channel. During the /th round, an 
riT X r code-matrix Xi is transmitted where [Xi, . . . , X^] e 
X C C""^^^-^, and a decoder Di is applied to decode the 
fragment [l^i, .... Yi] (c.f. dJTl i and (|39] l) corresponding to 
the fragmented code [Xi, . . . ,Xi\ g jjnTxiT ^jjj^ j^uhi- 
plexing gain r; — ri/l. The decoder Di either generates an 
acknowledgment (ACK) in which case a hard decision is made 
and the transmission of that message terminates, or generates 
a negative acknowledgment (NACK) in which case another 
transmission round is requested. The last decoder Dl always 
tries to decode the message. An error is considered only when 
a message is decoded erroneously. The DMT characteristics of 
the MIMO-ARQ channel were first considered in [66] where 
also the optimal DMT was obtained under two different fading 
models. We shall for sake of brevity only consider long-term 
fading where the channel H'^ remains constant over the L- 
rounds. We show in what follows how the results obtained 
herein can be applied to prove DMT optimality of lattice 
coding and LR-aided linear decoding for the MIMO-ARQ 
channel, for all n-R, tit, L and fading statistics. 

To this end, let Ai denote the event that a NACK is 
requested in the first round, and let r,„ax = sup{r|(iout(?') > 
0} where do^tir) denotes the optimal DMT for L = 1, 
i.e., in the absence of feedback. We assume that doat{r) is 
continuous over r e [0, r,„ax)- As in [21] we consider in 
parallel a fictitious system where [Xi, . . . , X^] is transmitted 
and where each of the decoders Vi operates independently on 
each of the fragments \Yi^ . . . , Yi], I ^ 1, .... L. Let Pej{ri) 
denote the probability of error of Vi in the fictitious system, 
and let Pe{r) denote the overall probability of error at the 
expected or average multiplexing gain r. The work in [21] 
provides, based on the work in [66], the following sufficient 
conditions for overall DMT optimality in the MIMO-ARQ 
setting. 

1) P (^i) p-", e > 

2) PeAn)<Pe.LirL), I ^ I, . . . , L - 1. 

3) PeM^L) = 

In brief, optimality follows from the above by observing that 

L 

PcA^l) < Pe{r) < ^Pe,i(n) 
1=1 



which by the second condition implies that Re{r) Pe,L{rL)- 
Based on the first condition it may be shown that r = ri (c.f. 
[21]) and by the third condition it follows that 

Pe{r) = = ) 

which corresponds to the maximal ARQ diversity [66]. The 
reader is referred to [21] for a detailed analysis. 

Now, let each Vi apply regularized lattice decoding, and 
an ACK-NACK policy similar to [21], [66] where an ACK is 
generated if and only if 

logdet{l + pH^H')'') > ylogp>nlogp, (43) 

for some x such that ri < x < r,„ax- This ACK-NACK 
policy is independenj^ of r = ri, provided j'l < x. Consider 
now the application of a code where each fragment code is 
approximately universal. Explicit lattice codes of this type 
are provided in [22]. As ( |43] | implies that the decoders for 
I ~ 1, . . . , L — 1 are only applied to channels not in outage 
it follows, as explained in Section IV-DI by the approximate 
universality of the fragment codes that Pe,i{ri) for 
I — 1, . . . ,L — 1. For I ^ L li follows directly by Theorem [T] 
and |2] that Pe,L{rL) = Regarding Ai it follows 

by (EJ that P (^i) = p-'^^-'^^) where dout(a;) > as 
X < Tniax, establishing the DMT optimality of the regularized 
lattice decoder for r E [0, rmax) when applied to the codes 
proposed in [22]. 

We remark that the DMT optimality of lattice coding and 
decoding for the MIMO-ARQ channel was in fact proven 
already in [66], albeit under the assumption of i.i.d. Rayleigh 
fading and T > nji + ut — 1, using a random construction 
similar to [6]. The argument presented above extends this 
result to LR-aided linear decoding, the minimum delay setting 
(T = rix) and more general fading statistics. 

E. Further examples and lattice designs 

The examples given above only constitute a subset of the 
scenarios to which the main results presented herein are 
applicable. For instance, ISI channels and generally selective 
fading channels [59] may be handled similarly to the parallel 
channel in Section IVI-BI The finite rate feedback scenarios 
and long term power allocation policies considered in [67] are 
handled similai'ly to the MIMO-ARQ channel in Section lVLD] 
Dynamic decode-and-forward (DDF) protocols, where relays 
decode and forward a received message whenever the relevant 
channels are not in outage, are also handled similarly to 
the MIMO-ARQ channel. The results extend to cover or- 
thogonal amplify and forward (OAF) as well as orthogonal 
and non-orthogonal selection decode and forward (OSDF 
and NSDF) relay protocols [20]. Approximately universal 
distributed codes exist for several such cooperative protocols 
and scenarios, see e.g., [20], [62]-[64], [68], and the regu- 
larized lattice decoders and their LR-aided linear counterparts 
achieves the corresponding approximate universality in these 
settings. 

^This is a technical requirement for the application of Theorem[T]that stems 
from the fact that we assume the statistics of in {T) to be independent of the 
multiplexing gain of the code applied. Note, however, that the independence 
is only required in a neighborhood of the target multiplexing gain. 
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TABLE I 

Lattice DIMENSIONALITY AND references for explicit 

TRANSCEIVERS IN DIFFERENT SETTINGS 



Channel 


n 


Lattice source 


m X m MIMO 


2m? 


[81, [HI, [151 


m X m, L-tone MIMO-OFDM 


2m? L 


[19], [20], [60] 


rax m, m-round MIMO-ARQ 


2m2 


[21] 


mxm, L-round MIMO-ARQ (AU) 


2m'^L 


[21] 


m-relay OAF 


2m 


[20] 


2-relay OSDF NSDF (r = 2) 


32,162 


[20] 


m-relay NAF 


8(m - 1) 


[18] 




8{m - 1)2 


[20] 


m-relay DDF, L-slots, m > 2 


2m^L 


[22] 



Table H] identifies the lattice dimensionality employed by 
DMT optimal implementations for different channels, as well 
as refers the reader to explicit descriptions of the design^ The 
potentially very large lattice dimensions faced when decoding 
such designs makes reduced complexity decoders essential to 
the successful deployment of these designs. 

VII. Conclusion 

The work presented an explicit characterization of effi- 
cient encoder-decoder structures that meet the fundamental 
DMT performance limits, and do so for very general channel 
statistics, dimensions, and models. Specifically, it proved that 
regularized lattice decoders, and the MMSE-GDFE decoder, 
provide DMT optimal decoding in its most general form, 
irrespective of the particular code applied. It also established, 
for the first time, that computationally efficient LR-aided 
linear decoders are capable of achieving the entire DMT. The 
generality of the results obtained lends them applicable to a 
plethora of pertinent communication scenarios which inher- 
ently introduce non-standard channel statistics, code-structure 
limitations and prohibitively high ML-decoding complexity. 

In terms of information theoretic guarantees on error prob- 
ability performance, the work extended prior state-of-art to a 
very general setting. In terms of implementability, the work 
covered the gap that exists, between the point of proving the 
existence of non-ML optimal transceivers, and the point of 
establishing what these transceivers are and how they can 
be efficiently applied. In terms of complexity guarantees, 
the work provides worst-case guarantees on the complexity 
required for DMT optimality. This is done despite the fact 
that the employed algorithms are generally known to have 
unbounded worst-case complexity. 

In terms of generality over codes, dimensions and channel 
statistics, we observe the following: Generality with respect 
to the codes addresses issues of legacy, and guarantees that 
the efficient regularized decoder structure will maintain, in 
most circumstances, the ML decoder DMT performance of 
the existing code structure. The generality thus also applies to 
communication scenarios which place restrictions on the form 
of the codes applied. 

*In the case of OAF and m-round MIMO-ARQ, DMT optimality is limited 
to a class of channels. All relay channels consider single-antenna nodes. 



GeneraUty with respect to channel dimensions is pertinent 
to computationally demanding scenarios that involve encoding 
over a large number of degrees of freedom, such as multi- 
toned OFDM, multi-tap ISI, as well as multi-round MIMO- 
ARQ and multi-slot DDF channels. In all the above, error 
probability performance gains require an increasing number 
of rounds/slots, which in turn result in linear increases in 
the problem dimensionality and exponential increases in the 
ML decoding complexity. The same generality with respect to 
dimension bypasses issues of channel asymmetry, as well as 
allows for a unified exposition of the problem. 

Finally, generality with respect to fading statistics main- 
tains the pertinent asymptotic guarantees to cases where the 
underlying fading and noise statistics are not entirely known, 
specifically to scenarios which inherently introduce hard to 
characterize channels such as different cooperative relaying 
protocols, as well as MIMO-OFDM and time-varying channels 
with arbitrary correlations. 

In terms of practicality, the presented transceivers allow for 
a broad spectrum of rate-reliability-complexity guarantees that 
result in near-optimal transmission energy, and reduced algo- 
rithmic power consumption and delay. Under the requirement 
for non-exponentially complex decoders, the work also allows 
for these rate-reliability guarantees in the presence of reduced 
hardware complexity, such as for example with a minimum 
number of transmit and receive antennas. Furthermore the 
efficient and universal applicability of the transceivers over 
different system models, allows for further diversification of 
resources over hybrid channels that near-optimally induce 
further gains in performance. In terms of future work, the 
results naturally motivate further joint study into new ap- 
proximation algorithms and code designs that together yield 
improved approximation ratios, and better performance in the 
non-asymptotic regime. 

Appendix A 
Equivalence of the MMSE-GDFE and the 
Regularized Lattice Decoder 

By "completion of squares" the regularized metric in (fTTI) 
may be written according to 

= x^H^Hx - 2y^Hx + y^y + x^Tx 

= x^B^Bx ~ 2y^F^Bx + y^F^Fy + T 

= \\Fy-Bxf + r (44) 

where B is any matrix for which B^B — (H^ H + T), 
where F — B^ and where 

r = y^[I - H{H^H + T)-^H^]y > . 

As r does not depend on x it may be disregarded in the 
optimization over x, i.e., the regularized lattice decoder may 
be alternatively expressed as 

= arg min \\Fy — . (45) 

xeA,. 

Comparing B, F, and (05]) or (fT4l l to the corresponding ex- 
pressions in [6], establishes the equivalence of the regularized 
decoder and the MMSE-GDFE decoder when T = I. 
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Further, if xj^ is a C-approximate solution to ( |45T l, i.e., if 

CWFy-Bxi^W > \\Fy~BxAf, 

for C > 1, it follows that 

CiWFy - Bxi^f + r) > \\Fy - Bxa^ + T , 

which by ( l44b implies that xa is also a C-approximate 
solution to ( fTTT l. 

Appendix B 
Proof of Lemma[T] 

Consider the conditional probability of ML decoder error at 
multiplexing gain r given that x Q B and i/r < I. As i^r < I 
there is d e S n A^, d 7^ 0, such that j/,. j\\Hd\\^ < 1 by 
the definition in ( fT6] l. Let a; = x + d, and note that 5; e TZOAr 
as d,x E B and d,x E A^. In other words, i; is a valid 
codeword in A",.. The probability that x achieves an ML metric 
which is lower than of x is given by the standard pairwise error 
probability [28], i.e., 

P{x^ x\x eB,H) = Q(UHd\\) > Q(l) > (46) 



where the last inequality follows by the assumption that i/r — 
jWHdW^ < 1, and where Q{-) is the Q-function. 

Let xml G '^r be the output of the ML decoder It follows 
that 

P (^ML ^ X) >P{xmL ^ X\X E B,Vr < 1) X 
V{XE B)V{Vr < 1) , 

where we use the independence of x and H (and thus also of x 
and i^r)- By ( l46l l it follows that P {xml x\x E B,Vr <1) = 
p". By applying the same approximation as in (|9]l it may, 
provided r > 0, be shown (c.f. [46]) that 

lim ¥{xeB) = ^ > 

when x is uniformly distributed over AV = TZ C^ Kr- This 
implies that V {x eB) = p°.\l follows that 



P (l^, < 1) < P (Aml ^ x) 
which is equivalent to ( [T7] i. 

Appendix C 
Proof of Lemma|2] 

Assume that 

P(||ff||| > p^) > p-'^^'LW 
for sufficiently large p. It then follows that 

E{||jy|||} >p^-'«mlw_ 

Thus, if E {II if III} <p it holds that 

V{\\H\\l>p'')<p-'^"^^''^ 



□ 



(47) 



(48) 



for any x > duhir) + 1. Let M = BG where G is the code 
lattice generator and B^B = H + T (c.f. Appendix IaJ. 
It holds that 

A,„ax(MTM) 



where Ainax(M'^M) and A,nin (Af'^^M) denotes the largest 
and smallest eigenvalues of M^M. Note that M = 
G^B^BG. As Xn,in{M^M) > An,in(G''^TG) > and 
X^,^{M^M) < \^,^{G^ HG) + X^UG^TG) where 
X^UG^H^HG) < X^,^{G^G)\\H\\l it follows that 



Amax(G'^G)||i/||| + AniaxCC^TG) 



Amin (G^TG) 

For a > it follows by (|49] | that for sufficiently large p 

||Jf|||<p- ^ k{M)<p". 
Thus, by (|48]) it follows that 

P(k(M) >p")<p-'*"LW 
for any a > ^(duhir) + 1). 
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